171 research outputs found

    Clustering via nonparametric density estimation: an application to microarray data.

    Get PDF
    Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps: (i) selection of relevant variables; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Applications on simulated and real data show promising results in comparison with those produced by two standard approaches, k-means and Mclust. In the simulation studies, our nonparametric approach shows performances comparable to those of models based on normality assumption, even in Gaussian settings. On the other hand, in two benchmarking real datasets, it outperforms the existing parametric approaches

    Genetic Epidemiology of Taste Perception and Cigarette Use

    Get PDF
    Cigarettes and other tobacco products contain bitter compounds including nicotine, which contribute to the chemosensory properties of tobacco and stimulate multiple sensory systems, including taste transduction pathways. Since bitter taste has evolved to identify potentially toxic compounds, and thus protect against harmful foods, our hypothesis is that aversion to this taste may prevent smoking and nicotine dependence. The goal of this research was to investigate the role of inherited differences in taste perception in smoking behaviors. We sought to determine whether such genetic variation could account for the well-known differences in flavored tobacco use among different U.S. ethnic groups. For example, around 80% of African-American smokers report that they prefer menthol cigarettes, compared to only 30% of European-American smokers who express this preference. We recruited subjects from four different populations, comprising a total of 9871 individuals, purified DNA’s from some saliva or blood samples and used a candidate gene approach to sequence taste-related genes. We identified several genetic associations between polymorphisms in taste-related genes and different smoking behaviors. We have shown that the frequency of the TAS2R38 taster haplotype differs between smokers and non-smokers in European-American populations. In addition, we identified two SNPs, one located in the menthol receptor TRPM8 and one in the menthol reactive gene TRPA1, that are strongly associated with menthol smoking in a study group of African-Americans. Moreover, we found that the taster haplotype of the TAS2R38 bitter taste receptor gene is less common in European-American smokers and that the non-taster haplotype of this gene is significantly lower in menthol smokers compared to non-menthol smokers. Overall, these findings support the hypothesis that variations in taste-related genes play a role in the choice of cigarettes when smoking. Understanding genetic differences in taste perception in tobacco use could help inform the development of more effective tobacco control policies

    Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

    Get PDF
    BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression

    ROC estimation and threshold selection criteria in three-class classification problems for clustered data

    Get PDF
    Statistical evaluation of diagnostic tests, and, more generally, of biomarkers, is a constantly developing field, in which complexity of the assessment increases with complexity of the design under which data are collected. One particularly prevalent type of data is clustered data, where individual units are naturally nested into clusters. In these cases, bias can arise from omission, in the evaluation process, of cluster-level effects and/or individual covariates. Focussing on the three-class case and for continuous-valued diagnostic tests, we investigate how to exploit the clustered structure of data within a linear-mixed model approach, both when the assumption of normality holds and when it does not. We provide a method for estimation of covariate-specific ROC surfaces and discuss methods for the choice of optimal thresholds, proposing three possible estimators. A proof of consistency and asymptotic normality of the proposed threshold estimators is given. All considered methods are evaluated by extensive simulation experiments. As an application, we study the use of the Lysosomal Associated Membrane Protein Family Member 5 (Lamp5) gene expression as biomarker to distinguish among three types of glutamatergic neurons

    Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data

    Get PDF
    BackgroundThe correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has produced methods to deal with the high sparsity and compositionality characteristic of microbiome data, but independent benchmarks comparing these to alternatives developed for RNA-seq data analysis are lacking.ResultsWe compare methods developed for single-cell and bulk RNA-seq, and specifically for microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, concordance, power, and correct identification of differentially abundant genera. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing.ConclusionsThe multivariate and compositional methods developed specifically for microbiome analysis did not outperform univariate methods developed for differential expression analysis of RNA-seq data. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner

    Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes.</p> <p>In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets.</p> <p>Results</p> <p>We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an <it>a priori </it>step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach).</p> <p>In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies.</p> <p>Conclusions</p> <p>STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.</p

    Combinatorial Expression of Grp and Neurod6 Defines Dopamine Neuron Populations with Distinct Projection Patterns and Disease Vulnerability

    Get PDF
    Midbrain dopamine neurons project to numerous targets throughout the brain to modulate various behaviors and brain states. Within this small population of neurons exists significant heterogeneity based on physiology, circuitry, and disease susceptibility. Recent studies have shown that dopamine neurons can be subdivided based on gene expression; however, the extent to which genetic markers represent functionally relevant dopaminergic subpopulations has not been fully explored. Here we performed single-cell RNA-sequencing of mouse dopamine neurons and validated studies showing that Neurod6 and Grp are selective markers for dopaminergic subpopulations. Using a combination of multiplex fluorescent in situ hybridization, retrograde labeling, and electrophysiology in mice of both sexes, we defined the anatomy, projection targets, physiological properties, and disease vulnerability of dopamine neurons based on Grp and/or Neurod6 expression. We found that the combinatorial expression of Grp and Neurod6 defines dopaminergic subpopulations with unique features. Grp+/Neurod6+ dopamine neurons reside in the ventromedial VTA, send projections to the medial shell of the nucleus accumbens, and have noncanonical physiological properties. Grp+/Neurod6- dopamine neurons are found in the VTA as well as in the ventromedial portion of the SNc, where they project selectively to the dorsomedial striatum. Grp-/Neurod6+ dopamine neurons represent a smaller VTA subpopulation, which is preferentially spared in a 6-OHDA model of Parkinson's disease. Together, our work provides detailed characterization of Neurod6 and Grp expression in the midbrain and generates new insights into how these markers define functionally relevant dopaminergic subpopulations
    • 

    corecore